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Stationary Algorithmic Probability 

Markus Miiller 



Abstract — Kolmogorov complexity and algorithmic probability 
are defined only up to an additive resp. multiplicative constant, 
since their actual values depend on the choice of the universal 
reference computer. In this paper, we analyze a natural approach 
to eliminate this machine-dependence. 

Our method is to assign algorithmic probabilities to the 
different computers themselves, based on the idea that "un- 
natural" computers should be hard to emulate. Therefore, we 
study the Markov process of universal computers randomly 
emulating each other. The corresponding stationary distribution, 
if it existed, would give a natural and machine-independent 
probability measure on the computers, and also on the binary 
strings. 

Unfortunately, we show that no stationary distribution exists 
on the set of all computers; thus, this method cannot eliminate 
machine-dependence. Moreover, we show that the reason for fail- 
ure has a clear and interesting physical interpretation, suggesting 
that every other conceivable attempt to get rid of those additive 
constants must fail in principle, too. 

However, we show that restricting to some subclass of comput- 
ers might help to get rid of some amount of machine-dependence 
in some situations, and the resulting stationary computer and 
string probabilities have beautiful properties. 

Index Terms — Algorithmic Probability, Kolmogorov Complex- 
ity, Markov Chain, Emulation, Emulation Complexity 

I. Introduction and Main Results 

SINCE algorithmic probability has first been studied in the 
1960s by Solomonoff, Levin, Chaitin and others (cf. (TJ, 
E|. El), it has revealed a variety of interesting properties, 
including applications in computer science, inductive inference 
and statistical mechanics (cf. Q], 0, (6)). The algorithmic 
probability of a binary string s is defined as the probability 
that a universal prefix computer U outputs s on random input, 

Pu(s):= 2HX '> W 

x£{0,l}* :U(x)=s 

where \x\ denotes the length of a binary string x G {0, 1}*. It 
follows from the Kraft inequality that 

Pu(s) =: £lu < 1, 

se{o,i}* 

where flu is Chaitin's famous halting probability. So algo- 
rithmic probability is a subnormalized probability distribution 
or semimeasure on the binary strings. It is closely related to 
prefix Kolmogorov complexity Kjj(s) which is defined [4| as 
the length of the shortest computer program that outputs s: 

Kjj(s) := min{|a;| | U(x) = s}. 

The relation between the two can be written as 

K u (s) = -logP u (s) + 0(l), (2) 
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where the C(l)-term denotes equality up to an additive 
constant. Both Kolmogorov complexity and algorithmic proba- 
bility depend on the choice of the universal reference computer 
U. However, they do not depend on U "too much": If U and 
V are both universal prefix computers, then it follows from 
the fact that one can emulate the other that 

Ku(s) = K v (s)+0(l), 

i.e. the complexities K\j and Ky differ from each other only 
up to an additive constant. Then Equation <J2J shows that the 
corresponding algorithmic probabilities differ only up to a 
multiplicative constant. 

This kind of "weak" machine independence is good enough 
for many applications: if the strings are long enough, then a 
fixed additive constant does not matter too much. However, 
there are many occasions where it would be desirable to 
get rid of those additive constants, and to eliminate the 
arbitrariness which comes from the choice of the universal 
reference computer. Examples are Artificial Intelligence [6| 
and physics Q, where one often deals with finite and short 
binary strings. 

We start with a simple example, to show that the machine- 
dependence of algorithmic probability can be drastic, and also 
to illustrate the main idea of our approach. Suppose that U n ice 
is a "natural" universal prefix computer, say, one which is 
given by a Turing machine model that we might judge as 
"simple". Now choose an arbitrary strings s consisting of a 
million random bits; say, s is attained by a million tosses of a 
fair coin. With high probability, there is no short program for 
U n i C e which computes s (otherwise toss the coin again and 
use a different string s). We thus expect that 

Pc/ nicc ( S ) « 2- 1 - 000 - 000 . 

Now we define another prefix computer [/bad as 

!s if x = 0, 

undefined if x = A or x = Oy, 
Unicc(y) if x = ly. 

The computer [/bad is universal, since it emulates the universal 
computer U n i ce if we just prepend a "1" to the input. Since 

f^bad(O) = s, we have 

*Kd(«) > \- 

Hence the algorithmic probability Pu(s) depends drastically 
on the choice of the universal computer U. Clearly, the 
computer [/bad seems quite unnatural, but in algorithmic 
information theory, all the universal computers are created 
equal — there is no obvious way to distinguish between them 
and to say which one of them is a "better choice" than the 
other. 
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So what is "the" algorithmic probability of the single string 
s? It seems clear that 2 -1 000 000 is a better answer than i, 
but the question is how we can make mathematical sense of 
this statement. How can we give a sound formal meaning 
to the statement that [/nice is more "natural" than [/bad? A 
possible answer is that in the process of randomly constructing 
a computer from scratch, one is very unlikely to end up with 
Uhad, while there is some larger probability to encounter J7 n icc- 

This suggests that we might hope to find some natural 
probability distribution /u on the universal computers, in such 
a way that /x(£/bad) -C /x(/7 n i ce ). Then we could define the 
"machine-independent" algorithmic probability P(s) of some 
string s as the weighted average of all algorithmic probabilities 
Pu(s), 

P(s)~ KU)Pu(s). (3) 

u universal 

Guided by Equation (01, we could then define 
"machine-independent Kolmogorov complexity" via 
K(s) := -logP(s). 

But how can we find such a probability distribution /i on the 
computers? The key idea here is to compare the capabilities 
of the two computers to emulate each other. Namely, by 
comparing U n i ce and ?7bad, one observes that 

• it is very "easy" for the computer [/bad to emulate the 
computer U n i ce : just prepend a "1" to the input. On the 
other hand, 

• it is very "difficult" for the computer U n i CC to emulate 
[/bad: to do the simulation, we have to supply U n i cc with 
the long string s as additional data. 



difficult 




Fig. 1. Computers that emulate each other. 

The idea is that this observation holds true more generally: 
"Unnatural" computers are harder to emulate. There are two 
obvious approaches to construct some computer probability 
fi from this observation — interestingly, both turn out to be 
equivalent: 

> The situation in Figure Q] looks like the graph of some 
Markov process. If one starts with either one of the two 
computers depicted there and interprets the line widths 
as transition probabilities, then in the long run of more 
and more moves, one tends to have larger probability 
to end up at U u i ce than at [/bad- So let's apply this 
idea more generally and define a Markov process of all 
the universal computers, randomly emulating each other. 
If the process has a stationary distribution (e.g. if it is 
positive recurrent), this is a good candidate for computer 
probability. 

• Similarly as in Equation (HJ, there should be a simple 
way to define probabilities Pjj(V) for computers U and 



V, that is, the probability that U emulates V on random 
input. Then, whatever the desired computer probability /i 
looks like, to make any sense, it should satisfy 

V universal 



But if we enumerate all universal computers as 
{U\, U2, U3, . . .}, this equation can be written as 
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Thus, we should look for the unknown stationary 
probability eigenvector /i of the "emulation matrix" 

(PuM))i, r 

Clearly, both ideas are equivalent if the probabilities Pu{V) 
are the transition probabilities of the aforementioned Markov 
process. 

Now we give a synopsis of the paper and explain our main 
results: 

• Section|II]contains some notational preliminaries, and de- 
fines the output frequency of a string as the frequency that 
this string is output by a computer. For prefix computers, 
this notion equals algorithmic probability (Example 12.21 ). 

• In Section [TTTJ we define the emulation Markov process 
that we have motivated above, and analyze if it has a 
stationary distribution or not. Here is the construction 
for the most important case (the case of the full set 
of computers) in a nutshell: we say that a computer C 
emulates computer D via the string x, and write C — — » D 
and D = (c if C(xy) = D(y) for all strings 
y. A computer is universal if it emulates every other 
computer. Given a universal computer, at least one of 
the two computers C — — > and C must be universal, 
too. 

Thus, we can consider the universal computers as the 
vertices of a graph, with directed edges going from 
U to V if U V or U V. Every vertex 

(universal computer) has either one or two outgoing edges 
(corresponding to the two bits). The random walk on this 
connected graph defines a Markov process: we start at 
some computer, follow the outgoing edges, and if there 
are two edges, we follow each of them with probability 
|, This is schematically depicted in Figure [2] 
If this process had a stationary distribution, this would 
be a good candidate for a natural algorithmic probabil- 
ity measure on the universal computers. Unfortunately, 
no stationary distribution exists: this Markov process is 
transient. 

We prove this in Theorem 13. 131 The idea is to construct 
a sequence of universal computers Mi, M2, M3, . . . such 
that Mi emulates M i+1 with high probability — in fact, 
with probability turning to 1 fast as i gets large. The 
corresponding part of the emulation Markov process is 
depicted in Figure [3] The outgoing edges in the upwards 
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Fig. 2. Schematic diagram of the emulation Markov process. Note that the 
zeroes and ones represent input bits, not transition probabilities, for example 
Ui(ly) = Us(y) for every string y. In our notation, we have for example 



direction lead back to a fixed universal reference com- 
puter, which ensures that every computer Mj is universal. 




Fig. 3. This construction in Theorem l3. 13| proves that the emulation Markov 
chain is transient, and no stationary distribution exists (the numbers are 
transition probabilities). It is comparable to a "computer virus" in the sense 
that each M; emulates "many" slightly modified copies of itself. 

As our Markov process has only transition probabilities 
i and 1, the edges going from Mi to Mj+i in fact 
consist of several transitions (edges). As those transition 
probabilities are constructed to tend to 1 very fast, the 
probability to stay on this A/^-path forever (and not 
return to any other computer) is positive, which forces 
the process to be transient. 

Yet, it is still possible to construct analogous Markov pro- 
cesses for restricted sets of computers Some of those 
sets yield processes which have stationary distributions; 
a non-trivial example is given in Example 13.141 

• For those computer sets $ with positive recurrent emula- 
tion process, the corresponding computer probability has 
nice properties that we study in Section [IV] The com- 
puter probability induces in a natural way a probability 
distribution on the strings s E {0,1}* (Definition 14. Il l 
as the probability that the random walk described above 
encounters some output which equals s. This probability 
is computer-independent and can be written in several 
equivalent ways (Theorem 14.2b . 

• A symmetry property of computer probability yields 
another simple and interesting proof why for the set of all 
computers — and for many other natural computer sets 
— the corresponding Markov process cannot be positive 
recurrent (Theorem 14.71 ). In short, if a is a computable 



permutation, then a computer C and the output permuted 
computer aoC must have the same probability as long as 
both are in the computer set $ (Theorem |4.61 l. If there are 
infinitely many of them, they all must have probability 
zero which contradicts positive recurrence. 

• For the same reason, there cannot be one particular "nat- 
ural" choice of a computer set $ with positive recurrent 
Markov process, because a o $ is always another good 
(positive recurrent) candidate, too (Theorem I4.81 >. 

• This has a nice physical interpretation which we explain 
in Section [V] algorithmic probability and Kolmogorov 
complexity always contain at least the ambiguity which 
is given by permuting the output strings. This permutation 
can be interpreted as "renaming" the objects that the 
strings are describing. 

We argue that this kind of ambiguity will be present in any 
attempt to eliminate machine-dependence from algorithmic 
probability or complexity, even if it is different from the 
approach in this paper. This conclusion can be seen as the 
main result of this work. 

Finally, we show in the appendix that the string probability 
that we have constructed equals, under certain conditions, the 
weighted average of output frequency — this is a particularly 
unexpected and beautiful result (Theorem \A.6i which needs 
some technical steps to be proved. The main tool is the study 
of input transformations, i.e., to permute the strings before the 
computation. The appendix is the technically most difficult 
part of this paper and can be skipped on first reading. 

II. Preliminaries and Output Frequency 

We start by fixing some notation. In this paper, we only 
consider finite, binary strings, which we denote by 

oo 

{0,1}* := |J{0 ) l} n = {A,0 ) l,00 ) 01,...}. 

The symbol A denotes the empty string, and we write the 
length of a string s E {0,1}* as \s\, while the cardinality of a 
set S is denoted #£. To avoid confusion with the composition 
of mappings, we denote the concatenation of strings with the 
symbol (g>, e.g. 

101® 001 = 101001. 

In particular, we have |A| = and \x® y\ = \x\ + \y\. A com- 
puter C is a partial-recursive function C : {0, 1}* — > {0, 1}*, 
and we denote the set of all computers by S. Note that our 
computers do not necessarily have to have prefix-free domain 
(unless otherwise stated). If C E S does not halt on some input 
x E {0, 1}*, then we write C(x) = oo as an abbreviation for 
the fact that C(x) is undefined. Thus, we can also interpret 
computers C as mappings from {0,1}* to {0,1}*, where 

WTY := {0, 1}* U {oo}. 

As usual, we denote by Kc{x) the Kolmogorov complexity of 
the string x E {0, 1}* with respect to the computer C E 5 

K c {x) :=min{|s| | s E {0, 1}*, C{s) = x} 

or as oo is this set is empty. 
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What would be a first, naive try to define algorithmic 
probability? Since we do not restrict our approach to prefix 
computers, we cannot take Equation (Q]) as a definition. Instead 
we may try to count how often a string is produced by the 
computer as output: 

Definition 2.1 (Output Frequency): 
For every C G S, n G No and s G {0, 1}*, we set 

(«), v #{x G {0, 1}" | C(x) = s} 
Pc ( s ) : = ^ ■ 

For later use in Section [Till we a ls° define for every C,DeS 
and n e Nn 



It follows that the limit fj,c p (s) := lim n ->oo /z^ (s) exists, and 
it holds 

ie(0 ; l}*:C(x)=s 



# {x G {0, ir 



C 



where the expression C — > D is given in Definition 13. II 
Our final definition of algorithmic probability will look very 
different, but it will surprisingly turn out to be closely related 
to this output frequency notion. 

The existence of the limit linin^oo fig (s) depends on the 
computer C and may be hard to decide, but in the special 
case of prefix computers, the limit exists and agrees with 
the classical notion of algorithmic probability as given in 
Equation ([T): 

Example 2.2 (Prefix Computers): A computer C G 3 is 
called prefix if the following holds: 

C(x) ^ oo ==>• C(x ® y) = oo for every y ^ A. 

This means that if C halts on some input x G {0,1}*, it 
must not halt on any extension x ® y. Such computers are 
traditionally studied in algorithmic information theory. To fit 
our approach, we need to modify the definition slightly. Call a 
computer C p G 3 prefix-constant if the following holds true: 



so the output frequency as given in Definition 12.11 converges 
for n — > oo to the classical algorithmic probability as given 
in Equation (Q}. Note that He = 1 — /ic p (oo). □ 

It is easy to construct examples of computers which are 
not prefix, but which have an output frequency which either 
converges, or at least does not tend to zero as n — > oo. 
Thus, the notion of output frequency generalizes the idea of 
algorithmic probability to a larger class of computers. 

III. Stationary Computer Probability 

As explained in the introduction, it will be an essential part 
of this work to analyze in detail how "easily" one computer C 
emulates another computer P. Our first definition specializes 
what we mean by "emulation": 

Definition 3.1 (Emulation): A computer C G 3 emulates 
the computer D G 5 via x G {0, 1}*, denoted 



C 



D 



resp. 



D= C 



if C(x®s) = D(s) for every s G {0, 1}*. We write C - 

if there is some x G {0, 1}* such that C — > P. 

It follows easily from the definition that C — C and 



D 



C 



D and D 



E 



Now that we have defined emulation, it is easy to extend the 
notion of Kolmogorov complexity to emulation complexity: 

Definition 3.2 (Emulation Complexity): For every C, D G 
S, the Emulation Complexity Kc(D) is defined as 



C p (x) ^oo=> C p (x®y) = C p (x) for every y G {0,1}*. K C (D) := min j|s| s G {0, 1}*,C d\ 



(4) 



It is easy to see that for every prefix computer C, one can find 
a prefix-constant computer C v with C v (x) = C(x) whenever 
C(x) 7^ oo. It is constructed in the following way: Suppose 
x G {0, 1}* is given as input into C p , then it 

« computes the set of all prefixes {xi}^J of x (e.g. for 
x = 100 we have xq = A, x\ = 1, x% = 10 and x% 
1 00). 

• starts |x| + 1 simulations of C at the same time, which 
are supplied with xq up to xi x \ as input, 

• waits until one of the simulations produces an output s G 
{0, 1}* (if this never happens, C p will loop forever), 

• finally outputs s. 

Fix an arbitrary string s G {0, 1}*. Consider the set 

rW(») := {x G {0, 1}* | |z| < n, C(x) = s} . 

Every string x G TW(s) can be extended (by concatenation) 
to a string x' of length n. By construction, it follows that 
Cp(x') = s. There are 2 n ~^ possible extensions x', thus 

Enn— \x\ 
_ z£T(")( s ) Z _ 



j2 2-1-1 

x£{0,l}*:\x\<n,C(x) = s 



or as oo if the corresponding set is empty. 
Note that similar definitions have already appeared in the 
literature, see for example Def. 4.4 and Def. 4.5 in JS), or 
the definition of the constant "sim(C)" in J9) . 

Definition 3.3 (Universal Computer): Let $ C 3 be a set 
of computers. If there exists a computer U G $ such that 
U — ► X for every X G $, then $ is called connected, 
and U is called a ^-universal computer. We use the notation 
<P a := {C G $ | C is ^-universal}, and we write § u := 
{C G 5 | C — > D MP G $ and 3X G_$ : X — ► C}. 

Note that $ u C <& u and $ a = <^> $ u = 0. Examples of 
connected sets of computers include the set 5 of all computers 
and the set of prefix-constant computers, whereas the set of 
computers which always halt on every input cannot be con- 
nected, as is easily seen by diagonalization. For convenience, 
we give a short proof of the first statement: 

Proposition 3.4: The set of all computers 5 is connected. 
Proof. It is well-known that there is a computer U that takes 
a description d,M G {0, 1}* of any computer M G 5 together 
with some input x G {0, 1}* and simulates M on input x, i.e. 

U((d M , x)) = M(x) for every x G {0, 1}*, 
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where (-, •} : {0,1}* x {0,1}* -> {0,1}* is a bijective and 
computable encoding of two strings into one. We can construct 
the encoding in such a way that (<1m,x) = dm ® x, i-e. the 
description is encoded into some prefix code that is appended 

to the left-hand side of x. It follows that U — ^> M, and since 
this works for every M G H, U is S-universal. □ 

Here is a basic property of Kolmogorov and emulation 
complexity: 

Theorem 3.5 (Invariance of Complexities): Let $ C S be 
connected, then for every U G $> u and V G it holds that 



Ku(D) 
Ku(s) 



< 
< 



K V {V) 
Ku(V) 



K V {D) 
K v {s) 



for every fl£$, 



for every s G {0, 1}" 



Proof. Since U G $ u , it holds [/ — ► V. Let a; be a shortest 
string such that U(x ® t) = V(t) for every t G {0, 1}*, i.e. 
\x\ = KjjiV). If p_d resp. p s are shortest strings such that 
V ^ D resp. V(p s ) = s, then \p D \ = K V (D) and |p s | = 
-Ky(s), and additionally {/ ^i^^-P £) an( j U(x®p s ) = s. Thus, 
Ku(D) < \x <E> pd\ = \x\ + \pd\ and K v (s) < \x ® p s | = 



Suppose some computer C G S emulates another computer 
_E via the string 10, i.e. C £. We can decompose this 
into two steps: Let D := C — >, then 



C 



D 



and 



D 



E. 



Similarly, we can decompose every emulation C — > D into 
\x\ parts, just by parsing the string x bit by bit, while getting 
a corresponding "chain" of emulated computers. A clear way 
to illustrate this situation is in the form of a tree, as shown in 
Figure [4] We start at the root A. Since C C, this string 
corresponds to the computer C itself. Then, we are free to 
choose or 1, yielding the computer (C or (C -V) = 
D respectively. Ending up with D, we can choose the next bit 
(taking a we will end up with E = = \ C ) 

and so on. 

In general, some of the emulated computers will themselves 
be elements of $ and some not. As in Figure |U we can mark 
every path that leads to a computer that is itself an element of 
$ by a thick line. (In this case, for example C,D,E G $, but 
& If we want to restrict the process of parsing 
through the tree to the marked (thick) paths, then we need the 
following property: 

Definition 3.6: A set of computers f C 5 is called bran- 
ching, if for every C G $, the following two conditions are 
satisfied: 

• For every 1,56 {0, 1}*, it holds 



G $ 



C 



G 



• There is some x G {0, 1}* \ {A} such that \ C G $. 

If $ is branching, we can parse through the corresponding 
marked subtree without encountering any dead end, with the 
possibility to reach every leaf of the subtree. In particular, 
these requirements are fulfilled by sets of universal computers: 



000 







001 


00 
















010 




01 




011 




Fig. 4. Emulation as a tree, with a branching subset <E> (bold lines). 



Proposition 3.7: Let $ C 2 be connected and #<I )l/ > 2, 
then Q u is branching. 

Proof. Let C G W and (c ^) G W, then (c ^) — » 

D for every De$, and so (C -^j — > D for every D G 
Moreover, there is some X G $ such that X — > C, so in 
particular, AT — > . Thus, -^^ G I 17 . 

On the other hand, since #$ c/ > 2, there are computers 
C, D G $ u such that C ^ D. By definition of $ u , there is 
some A G $ such that A — ► D. Since C emulates every 
computer in $, we have C — ► A, so C — — > D for some 
z/A. □ 

As illustrated in the bold subtree in Figure |4j we can 
define the process of a random walk on this subtree if its 
corresponding computer subset $ is branching: we start at the 
root A, follow the branches, and at every bifurcation, we turn 
"left or right" (i.e. input an additional or 1) with probability 
i. This random walk generates a probability distribution on 
the subtree: 

Definition 3.8 (Path and Computer Probability): If $ G 5 
is branching and let C S f , we define the $-free of C as the 
set of all inputs x G {0, 1}* that make C emulate a computer 
in $ and denote it by C7 _1 ($), i.e. 

fr 1 ($)-{ie{o ) i}* 1 (c-z+) g$}. 

To every x in the $-tree of C, we can associate its path 
probability p,Q-i/^\(x) as the probability of arriving at x on 
a random walk on this tree. Formally, 



A i c- 1 (*)( / ^) 



if a:® b eC" 1 ^) 
otherwise 



for every bit b G {0, 1} with x®b G C _1 (<1>), where b denotes 
the inverse bit. The associated n-step computer probability of 
D G $ is defined as the probability of arriving at computer 
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D on a random walk of n steps on this tree, i.e. 

E mc-h^o)- 



$\D\*) := 



It is clear that for $ = 3, we get back the notion of output 
frequency as given in Definition 12.11 For every C, D 6 5, it 
holds 

^(D)=$\D\Z). 

The condition that <£> shall be branching guarantees that 
Exe{04}"nc-i(*) = 1 for every n G N , i.e. the 

conservation of probability. For example, the path probability 
in Figure @] has values fJ-c- 1 ^)^) = fJ-c- 1 ^)^) = §> 

Mc-i(*)(00) = ^c-i(*)( 01 ) = i- /»c-i(*)( 10 ) = ^' 
/i C -i ($) (001) = M c- 1( *)(010) = i = ^ c -i (4) (100) = 
Mc-i(#)(ifJl). 

It is almost obvious that the random walk on the subtree 
that generates the computer probabilities /i^ (.D|<I>) is a 
Markov process, which is the statement of the next lemma. 
For later reference, we first introduce some notation for the 
corresponding transition matrix: 

Definition 3.9 (Emulation Matrix): Let $ C 5 be branch- 
ing, and enumerate the computers in $ in arbitrary order: 
$ = {C%, Ca, . . .}. Then, we define the (possibly infinite) 
emulation matrix E$> as 



(i) 



Lemma 3.10 (Markovian Emulation Process): If $ C 5 is 
branching, then the computer probabilities /z[?' ) (-|<i>) are n-step 
probabilities of some Markov process (which we also denote 
<£>) whose transition matrix is given by the emulation matrix 
E®. Explicitly, with Si := (0, . . . , 0, 1 , 0, . . .), 

i 

(5) 

for every n G No, and we have the Chapman-Kolmogorov 
equation 

^ +n \D\^>) = £ &\X\*),%\D\*) (6) 

for every m, n G No. Also, $c5 (resp. E^) is irreducible if 
and only if C — > D for every C, D G $. 
Proof. Equation (0 is trivially true for n = and is shown in 
full generality by induction (the proof details are not important 
for the following argumentation and can be skipped): 



/4" +1) (^) 



= E (JL C -l(<l>)(x®b) 

= E E E /x c -i($)(a;(g)6) 

C fc e* ie(o,i}» : be{o,i}: 

= E E f*c-H*){x)$l(Ci\$) 

CfeG* ccG{0,l}": 

Ci — >Ck 



The Chapman-Kolmogorov equation follows directly from the 
theory of Markov processes. The stochastic matrix _E$ is 
irreducible iff for every i,j G N there is some n G N such 
that < ((E<j,) n ) i . = fi^(Cj\$), which is equivalent to the 
existence of some x G {0, 1}™ such that Ci — — > Cj. □ 



The next proposition collects some relations between the 
emulation Markov process and the corresponding set of com- 
puters. We assume that the reader is familiar with the basic 
vocabulary from the theory of Markov chains. 

Proposition 3.11 (Irreducibility and Aperiodicity): Let 
$ C 5 be a set of computers. 

. $ is irreducible <^> $ = § u <=> $ C § u . 

> If $ is connected and # < £> l/ > 2, then Q u is irreducible 
and branching. 

• If $ is branching, then we can define the period of C G $ 
as d(C) := GGT (n G N ^ (C|$) > j (resp. oo if 
this set is empty). If $ C 5 is irreducible, then d(C) = 
d(D) =: d < oo for every C, D G $ holds true. In this 
case, d will be called the period of $, and if d = 1, then 
$ is called aperiodic. 
Proof. To prove the first equivalence, suppose that $ C 5 
is irreducible, i.e. for every C,D G $ it holds C — > D. 
Thus, $ is connected and C G so $ C $> u , and since 
always C it follows that $ = On the other hand, 
if $ = <& u , then for every C, D G $ it holds (7 — > D, since 
C G $ u . Thus, $ is irreducible. For the second equivalence, 
suppose that $ is irreducible, thus, $ = & u C & u . If on the 
other hand $ C $ t/ ', it follows in particular for every Ce$ 
that C — ► X for every X G so $ is irreducible. 

For the second statement, let C, X G ^ be arbitrary. By 
definition of it follows that there is some V G $ such 
that V" — ► X, and it holds C — > V, soC — ► X, and ¥^ 
is irreducible. By Proposition ET7l and > #$ c/ > 2, $ y 

must be branching. The third statement is well-known from 
the theory of Markov processes. □ 

A basic general result about Markov processes now gives 
us the desired absolute computer probability - almost, at least: 

Theorem 3.12 (Stationary Alg. Computer Probability): Let 
$ C 5 be branching, irreducible and aperiodic. Then, for 
every C, D G $, the limit ("computer probability") 

/i(D|$) := lim ^ l) (L>|$) 

n — >oo 

exists and is independent of C. There are two possible cases: 

(1) The Markov process which corresponds to $ is transient 
or null recurrent. Then, 

/J.(D\$) = for every fl£$. 

(2) The Markov process which corresponds to $ is positive 
recurrent. Then, 

[i(D\$) > for every D G $, and ^ K D \&) = 1 - 

In this case, the vector ^ := ( / it(Ci|<I>),/x(C2| < I ) ), . . .) is 
the unique stationary probability eigenvector of i.e. 
the unique probability vector solution to /i$ • E$ = 
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every x £ {0,1}*. Thus, (m 
for every 0* ^ s G {0, 1}\ and so 



^(M l+1 \E U ) = 1 - 2~ 4 for every i G N. 

Iterated application of the Chapman-Kolmogorov equation (O 
yields for every n G N 

M i 1 / ) i (Af 2 |S^).^ ) 2 (M 3 |S^) 



Note that we have derived this result under quite weak con- 
ditions — e.g. in contrast to classical algorithmic probability, 
we do not assume that our computers have prefix-free domain. 
Nevertheless, we are left with the problem to determine 
whether a given set $ of computers is positive recurrent (case 
(2) given above) or not (case (1)). 

The most interesting case is $ = E v , i.e. the set of 
computers that are universal in the sense that they can simulate 
every other computer without any restriction. This set is 
"large" — apart from universality, we do not assume any 

additional property like e.g. being prefix. By Proposition 13.1 II /ij^~ 2+ '" + (M„ \E U ) 
E u is irreducible and branching. Moreover, fix any universal 
computer U G E u and consider the computer V G H, given 
by 

A if x = A 

V(x) := { V(s) if x = 0fe3s 
U(s) if x = 1 ® s 



by the string P, i.e. Mi(l®ll®. . .0 l i_1 ®s®z) = -Mi(l<8> 
11 ® ... (8 l i_1 ® V ® x), resp. M 4 (s <g> x) = M i (l i ® x) for 

^ J = M +1 



= M, 



> 



n-1 



> 



IK 1 - 2 

i=l 
oo 

IK 1 2 



0.2887. 



As V — ► J7, we know that V G and since V — ► V, it 
follows that y!y\v) > 0, and so d(V) = 1 = d(E u ). Hence 
S' 7 is aperiodic. 

So is S* 7 positive recurrent or not? Unfortunately, the 
answer turns out to be negative: E u is transient. The idea to 
prove this is to construct a sequence of universal computers 
Mi, M2, M3, . . . such that each computer Mi emulates the 
next computer Mj+i with large probability, that is, the prob- 
ability tends to one as i gets large. Thus, starting the random 
walk on, say, Mi, it will with positive probability stay on this 
Mj-path forever and never return to any other computer. See 
also Figure [3] in the Introduction for illustration. 

Theorem 3.13 (Markoff Chaney Virus): 
E u is transient, i.e. there is no stationary algorithmic computer 
probability on the universal computers. 

Proof. Let U G E u be an arbitrary universal computer with 
U(X) = 0. We define another computer Mi G 3 as follows: 
If some string s G {0,1}* is supplied as input, then Mi 

• splits the string s into parts sx, s 2 , ■ ■ ■ , Sk, s ta iu sucn that 
s = S\ ® S2 <8> • • ■ <8> Sk <8> Stan an d l s i| = i f° r every 
1 < i < k. We also demand that \s ta u\ < k + 1 (for 
example, if s = 101101101011, then s a = 1, s 2 = 01, 
s 3 = 101, s 4 = 1010 and s tail = 11), 

• tests if there is any i G {1, . . . , k} such that = 0* (i.e. 
Si contains only zeros). If yes, then Mi computes and 
outputs J7(sj+i ® . . . <g> Sk ® stai;) (if there are several i 
with Si = l , then it shall take the smallest one). If not, 
then Mi outputs l k = 1 ... 1 . 




Ma := Mi 

resp. Mj — > 
G S 17 for every 



Let M 2 := Mi — M 3 

and so on, in general M n 

M i+ i. We also have Mj ?7, so M 

i 6 N. Thus, the computers Mi are all universal. Also, since 
Mi (A) = Mi(l 1+ -+( i - 1 )) = l* -1 , the computers M, are 
mutually different from each other, i.e. Mi ^ Mj for i ^ j. 
Now consider the computers Mj — for |s| = i, but s / 1 . It 
holds Mi(s®x) = Mi(l® 11(g). . .0 1*" 1 ®s®x). The only 
property of s that affects the outcome of Mi's computation is 
the property to be different from 0' . But this property is shared 



With at least this probability, the Markov process correspond- 
ing to $ will follow the sequence of computers {Mi}i g pj 
forever, without ever returning to M\. (Note that also the 
intermediately emulated computers like Mi are different 
from Mi, since Mi (A) = A, but (m x -^A (A) ^ A.) Thus, 
the eventual return probability to Mi is strictly less than l.D 

In this proof, every computer Mj+i is a modified copy of its 
ancestor Mi. In some sense, Mi can be seen as some kind of 
"computer virus" that undermines the existence of a stationary 
computer probability. The theorem's name "Markoff Chaney 
Virus" was inspired by a fictitious character in Robert Anton 
Wilson's "Illuminatus!" trilog^Q. 

The set E u is in some sense too large to allow the existence 
of stationary algorithmic probability distribution. Yet, there 
exist computer sets $ that are actually positive recurrent and 
thus have such a probability distribution; here is an explicit 
example: 

Example 3.14 (A Positive Recurrent Computer Set): Fix 
an arbitrary string u G {0,1}* with \u\ > 2, and let U be 
a universal computer, i.e. U G E u , with the property that it 
emulates every other computer via some string that does not 
contain u as a substring, i.e. 

V.D G H 3d G {0, 1}* : U — » D and u not substring of d. 

If C G E is any computer, define a corresponding computer 
CuM by C UiU (x) = U(y) if x = w ® u ® y and y 
does not contain a as a substring, and as C u> u(x) = G(x) 
otherwise (that is, if x does not contain u). The string u is 
a "synchronizing word" for the computer C u jj, in the sense 
that any occurrence of u in the input forces C u jj to "reset" 
and to emulate U. 

1 "The Midget, whose name was Markoff Chaney, was no relative of the 
famous Chaney s of Hollywood, hut people did keep making jokes about that. 
[...] Damn the science of mathematics itself, the line, the square, the average, 
the whole measurable world that pronounced him a bizarre random factor. 
Once and for all, beyond fantasy, in the depth of his soul he declared war on 
the "statutory ape", on law and order, on predictability, on negative entropy. 
He would be a random factor in every equation; from this day forward, unto 
death, it would be civil war: the Midget versus the Digits... " 
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We get a set of computers 

*u,u ■= {C u ,u I C e S}. 
Whenever x does not contain it as a substring, it holds 

C — > D =>■ C u .u — ► Du.u- 
It follows that V := U u .u is a universal computer for 



u- 



Thus $>u,u is connected, and it is easy to see that <£>„ u — -* uU 
and > 2. According to to Proposition 13.1 II is 

irreducible and branching. An argument similar to that before 
Theorem 13.131 (where it was proved that E u is aperiodic) 
proves that $^ v is also aperiodic. Moreover, by construction 
it holds for every computer C € v and I := \u\ 

The Chapman-Kolmogorov equation (O then yields 



xe<s> L 



Consequently, lirasup n _, 00 Mc^C^l^u i/) — According 
to Theorem 13. 121 it follows that is positive recurrent. In 

particular, fJ,(V\<b%u) - 2~ |u| . Note also that #®uU = 00 > 
so we do not have the trivial situation of a finite computer set. 

Obviously, the computer set <& u ,u m tne previous example 
depends on the choice of the string u and the computer U; 
different choices yield different computer sets and different 
probabilities. In the next section, we will see in Theorem 14.81 
that every positive recurrent computer set contains an unavoid- 
able "amount of arbitrariness", and this fact has an interesting 
physical interpretation. 

Given any positive recurrent computer set $ (as in the previ- 
ous example), the actual numerical values of the corresponding 
stationary computer probability /i(-| ( I > ) will in general be 
noncomputable. For this reason, the following lemma may 
be interesting, giving a rough bound on stationary computer 
probability in terms of emulation complexity: 

Lemma 3.15: Let $ C S be positive recurrent Then, for 
every C,fl6$, we have the inequality 

H(D\$) 



-Kc(D) 



< 



< 2 



K D (C) 



M (C|$) 

Proof. We start with the limit m — * oo in the Chapman- 
Kolmogorov equation © and obtain 



> 



»{C\<S>)$\d\<S>) 



for every n e No- Next, we specialize n :— Kc(D), then 
jt4? 3 (D|$) > 2~ n . This proves the left hand side of the 
inequality. The right hand side can be obtained simply by 
interchanging C and D. □ 



2 In the following, by stating that some computer set $ C 3 is positive 
recurrent, we shall always assume that $ is also branching, irreducible and 
aperiodic. 



IV. Symmetries and String Probability 

The aim of this section is twofold: on the one hand, we 
will derive an alternative proof of the non-existence of a 
stationary computer probability distribution on E u (which 
we have already proved in Theorem 13.131 . The benefit of 
this alternative proof will be to generalize our no-go result 
much further: it will supply us with an interesting physical 
interpretation why getting rid of machine-dependence must be 
impossible. We discuss this in more detail in Section [V] 

On the other hand, we would like to explore what happens 
for computer sets $ that actually are positive recurrent. In 
particular, we show that such sets generate a natural algo- 
rithmic probability on the strings — after all, finding such a 
probability distribution was our aim from the beginning (cf. 
the Introduction). Actually, this string probability turns out 
to be useful in proving our no-go generalization. Moreover, it 
shows that the hard part is really to define computer probability 
— once this is achieved, string probability follows almost 
trivially. 

Here is how we define string probability. While computer 
probability /i(C|$) was defined as the probability of encoun- 
tering C on a random walk on the <£>-tree, we analogously 
define the probability of a string s as the probability of getting 
the output s on this random walk: 

Definition 4.1 (String Probability): Let $ C S be branch- 
ing and let C6f. The n-step string probability of s € {0, 1}* 
is defined as the probability of arriving at output s on a random 
walk of n steps on the <l>-tree of C, i.e. 

Mc'N*) := E Mc-M*)( x )' 

x£{0,iy*nC- 1 (<S>):C(x) = s 

Theorem 4.2 (Stationary Algorithmic String Probability): 
If $ C H is positive recurrent, then for every C € $ and 
s € {0, 1}* the limit 

Urn At£°(s|*) 

E WW 

ue<s>-.u(\)=s 

exists and is independent of C. 

Proof. It is easy to see from the definition of n-step string 
probability that 

» { c\^)= E *4 B> ow 

ue<s>-.u(\)=s 

Taking the limit n — > oo, Theorem 13.121 yields equality of 
left and right hand side, and thus existence of the limit and 
independence of C. □ 



/i( S |$) 



E^l<(#) 



In general, is a probability distribution on {0,1}* 

rather than on {0, 1}*, i.e. the undefined string can have 
positive probability, /x(oo|$) > 0, so X^e{o n« m( s I^) < 1- 

We continue by showing a Chapman-Kolmogorov-like 
equation (analogous to Equation (O) for the string probability. 
Note that this equation differs from the much deeper result of 
Theorem IA.6I in the following sense: it describes a weighted 
average of probabilities /z^ (s|$), and those probabilities do 
not only depend on the computer U (as in Theorem I A. 61 ). but 
also on the choice of the subset 
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Proposition 4.3 (Chapman-Kolmogorov for String Prob.): 
If $ C 5 is positive recurrent, then 

for every C G m, n G No and s 6 {0, 1}*. 
Proof. For x, y G {0, 1}*, we use the notation 

r _ / if x ^ y 
— \ 1 if x = ?/ 

and calculate 

xe{04} m +"nc- 1 (*) 

= x;^ m) (vi$)x;^ n) (^i*)^ o) w*). 

The second sum equals /x^(s|3>) and the claim follows. □ 

For prefix computers C, algorithmic probability Pc(s) of 
any string s as defined in Equation (Q]l and the expression 
2-Kcw differ on jy by a multiplicative constant [4|. Here is 
an analogous inequality for stationary string probability: 

Lemma 4.4: Let $ C S be positive recurrent and C G $ 
some arbitrary computer, then 

K S W) > M(C| $ ) ' 2- Xc(s) for all s G {0, 1}*. 

Proof. We start with the limit m -» oo in the Chapman- 
Kolmogorov equation given in Proposition 14.31 and get 

> M (C|$)^ n) ( S |$) 

for every n G No- Then we specialize n := -ftTc( s ) an d use 
^c ] ( s l $ ) ^ for this choice of n - ° 

Looking for further properties of stationary string probabil- 
ity, it seems reasonable to conjecture that, for many computer 
sets $, a string s G {0,1}* (like s — 10111) and its 
inverse s (in this case s = 01000) have the same probability 
/i(s|$) = /i(s|$), since both seem to be in some sense 
algorithmically equivalent. A general approach to prove such 
conjectures is to study output transformations: 

Definition 4.5 (Output Transformation a): 
Let a : {0, 1}* — > {0, 1}* be a computable permutation. For 
every C G S, the map a o C is itself a computer, defined by 
(7 o C(x) := cr(C(x)). The map C ^ a o C will be called an 
output transformation and will also be denoted a. Moreover, 
for computer sets $ C 3, we use the notation 

a o $ := {a o C \ C G $} . 

Under reasonable conditions, string and computer probability 
are invariant with respect to output transformations: 



Theorem 4.6 (Output Symmetry): Let $ C 5 be positive re- 
current and closecH with respect to some output transformation 
a and its inverse er -1 . Then, we have for every C G $ 

/i(C|$) = n(aoC\^) 

and for every s G {0, 1}* 

=M^(s)|$). 

Proof. Note that $ = ct o $. Let (7, D G $. Suppose that 
C L> for some bit & G {0, 1}. Then, 

a o (7(6 (8) x) = <j{D{x)) =ao D{x). 

Thus, we have a o C — — ► a o D. It follows for the 1-step 
transition probabilities that 

for every i, j. Thus, the emulation matrix E$ does not change 
if every computer C (or rather its number in the list of all 
computers) is exchanged with (the number of) its transformed 
computer a o C yielding the transformed emulation matrix 
E ao $,. But then, Eq> and E ao <& must have the same unique 
stationary probability eigenvector 

= (M(C fc |$))t* = M = (/*(* ° Cfcl*))^i ■ 

This proves the first identity, while the second identity follows 
from the calculation 

= E m(W= E 

E/6*:i7(A)=s E/e*:U(A)=s 

E M(^|$)=^(a( S )|$). □ 

ve<To* : y(A)=(T(s) 

Thus, if some computer set $ C S contains e.g. for every 
computer C also the computer C which always outputs 
the bitwise inverse of C, then ju(s|$) = /i(s|$) holds. In 
some sense, this shows that the approach taken in this paper 
successfully eliminates properties of single computers (e.g. to 
prefer the string 10111 over 01000) and leaves only general 
algorithmic properties related to the set of computers. 

Moreover, Theorem 14.61 allows for an alternative proof that 
"Ef and similar computer sets cannot be positive recurrent. We 
call a set of computable permutations S := {<7i}ieN cyclic 
if every string s G {0,1}* is mapped to infinitely many 
other strings by application of finite compositions of those 
permutations, i.e. if for every s G {0, 1}* 

# { a h o a l2 o . . . o o iN (s) | N G N, i n G N} = OO, 

and if S contains with each permutation a also its inverse er -1 . 
Then, many computer subset cannot be positive recurrent: 

Theorem 4.7 (Output Symmetry and Positive Recurrence): 
Let $ C S be closed with respect to a cyclic set of output 
transformations, then $ is not positive recurrent. 
Proof. Suppose $ is positive recurrent. Let S := {(7i}i£N 
be the corresponding cyclic set of output transformations. Let 

3 We say that a computer set $ C H is closed with respect to some 
transformation T : 3 -> 3 if <5> D T($) := {T(C) \ C £ *}. 
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s € {0, 1}* be an arbitrary string, then for every composition 
a := (Tj-l o . . . o (j; LN , we have by Theorem 14.61 

m(«|*) = a*(*(«)I*)- 

Since 5 is cyclic, there are infinitely many such transfor- 
mations er, producing infinitely many strings er(s) which all 
have the same probability. It follows that //(s |<I>) = 0. Since 
s G {0, 1}* was arbitrary, this is a contradiction. □ 

Again, we conclude that "EF is not positive recurrent, 
since this computer set is closed with respect to all output 
transformations. 

Although "By is not positive recurrent, there might be a 
unique, natural, "maximal" or "most interesting" subset $cH 
which is positive recurrent. What can we say about this 
idea? In fact, the following theorem says that this is also 
impossible. As this theorem is only a simple generalization 
of Theorem 14.61 we omit the proof. 

Theorem 4.8 (Non-Uniqueness): If $ C 5 is positive re- 
current, then so is a o $ for every computable permutation 
(=output transformation) a. Moreover, 

M(C|$) = ^(croC|(TO$) 

for every C € and 

/x(s|$) = fj,(a(s)\a o $) 
for every s € {0, 1}*. 

This means that there cannot be a unique "natural" positive 
recurrent computer set <E>: for every such set there exist 
output transformations a such that a o $ ^ $ (this follows 
from Theorem 14. 7b . But then, Theorem 14.81 proves that a o <£ 
is positive recurrent, too — and it is thus another candidate 
for the "most natural" computer set. 

V. Conclusions and Interpretation 

We have studied a natural approach to get rid of machine- 
dependence in the definition of algorithmic probability. The 
idea was to look at a Markov process of universal computers 
emulating each other, and to take the stationary distribution as 
a natural probability measure on the computers. 

This approach was only partially successful: as the cor- 
responding Markov process on the set of all computers is 
not positive recurrent and thus has no unique stationary 
distribution, one has to choose a subset $ of the computers, 
which introduces yet another source of ambiguity. 

However, we have shown (cf. Example l3.14l ) that there exist 
non-trivial, infinite sets <£> of computers that are actually posi- 
tive recurrent and possess a stationary algorithmic probability 
distribution. This distribution has beautiful properties and 
eliminates at least some of the machine-dependence arising 
from choosing a single, arbitrary universal computer as a 
reference machine (e.g. Theorem I4.61 l. It gives probabilities 
for computers as well as for strings (Theorem I4.21 >. agrees 
with the average output frequency (Theorem IA.61 >. and does 
not assume that the computers have any specific structural 
property like e.g. being prefix-free. 

The second main result can be stated as follows: There is no 
way to get completely rid of machine-dependence, neither in 



the approach of this paper nor in any other similar but different 
approach. To understand why this is true, recall that the main 
reason for our no-go result was the symmetry of computer 
probability with respect to output transformations C i— > a oC, 
where a is a computable permutation on the strings. This can 
be seen in two places: 

• In Theorem 14.71 this symmetry yields the result that any 
computer set which is "too large" (like E u ) cannot be 
positive recurrent. 

• Theorem 14.81 states that if a set $ is positive recurrent, 
then a o $ must be positive recurrent, too. Since in this 
case <1> ^ a o <£> for many a, this means that there cannot 
be a unique "natural" choice of the computer set $. 

Output transformations have a natural physical interpretation 
as "renaming the objects that the strings are describing". 
To see this, suppose we want to define the complexity of 
the microstate of a box of gas in thermodynamics (this can 
sometimes be useful, see [4|). Furthermore, suppose we are 
only interested in a coarse-grained description such that there 
are only countably many possibilities what the positions, 
velocities etc. of the gas particles might look like. Then, 
we can encode every microstate into a binary string, and 
define the complexity of a microstate as the complexity of the 
corresponding string (assuming that we have fixed an arbitrary 
complexity measure K on the strings). 

But there are always many different possibilities how to 
encode the microstate into a string (specifying the velocities 
in different data formats, specifying first the positions and then 
the velocities or the other way round etc.). If every encoding 
is supposed to be one-to-one and can be achieved by some 
machine, then two different encodings will always be related 
to each other by a computable permutation. 

In more detail, if one encoding e\ maps microstates m to 
encoded strings ei(m) € {0,1}*, then another encoding €2 
will map microstates m to 62(111) — a(ei(m)), where a is a 
computable permutation on the strings (that depends on e\ and 
e%). Choosing encoding e\, a microstate m will be assigned 
the complexity K(ei(m)), while for encoding e%, it will be 
assigned the complexity K(a o ei(m)). That is, there is an 
unavoidable ambiguity which arises from the arbitrary choice 
of an encoding scheme. Switching between the two encodings 
amounts to "renaming" the microstates, and this is exactly an 
output transformation in the sense of this paper. 

Even if we do not have the situation that the strings shall 
describe physical objects, we encounter a similar ambiguity 
already in the definition of a computer: a computer, i.e. a 
partial recursive function, is described by a Turing machine 
computing that function. Whenever we look at the output 
of a Turing machine, we have to "read" the output from 
the machine's tape which can potentially be done in several 
inequivalent ways, comparable to the different "encodings" 
described above. 

Every kind of attempt to get rid of those additive constants 
in Kolmogorov complexity will have to face this ambiguity of 
"renaming". This is why we think that all those attempts must 
fail. 
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Appendix 

String Probability is the Weighted Average of 
Output Frequency 

This appendix is rather technical and can be skipped on 
first reading. Its aim is to prove Theorem IA.6I This theorem 
says that the string probability which has been introduced in 
Definition 14.11 in Section [IV] is exactly what we really wanted 
to have from the beginning: in the introduction, our main 
motivation to find a probability measure on the computers 
was to define machine-independent algorithmic probability of 
strings as the weighted mean over all universal computers 
as stated in Equation (01. Theorem IA.6I says that string 
probability can be written exactly in this way, given some 
natural assumptions on the reference set of computers. 

Note that Theorem IA.6I is a surprising result for the fol- 
lowing reason: string probability, as defined in Definition 14.11 
only depends on the outputs of the computers on the "universal 
subtree", that is, on the leaves in Figure H] which correspond 
to bold lines. But output frequency, as given on the right-hand 
side in Theorem IA.6I and defined in Definition 12.11 counts 
the outputs on all leaves — that is, output frequency is a 
property of a single computer, not of the computer subset that 
is underlying the emulation Markov process. 

In Section [TV] we have studied output transformations on 
computers — the key idea in this appendix will be to study 
input transformations instead. So what is an input transforma- 
tion? If a : {0, 1}* — > {0, 1}* is a computable permutation on 
the strings and C G 3 is some computer, we might consider the 
transformed computer Cog, given by (C oa)(s) := C(a(s)). 
But this turns out not to be useful, since such transformations 
do not preserve the emulation structure. In fact, the most 
important and useful property of output transformations in 
Section [IV] was that they preserve the emulation structure: it 
holds 

C D <^ <toC AffoD. 

But for transformations like C t— > C o a, there is no such 
identity — hence we have to look for a different approach. 
It turns out that a successful approach is to look only at a 
restricted class of permutations, and also to introduce equiva- 
lence classes of computers: 

Definition A.l (Equivalence Classes of Computers): 
For every k G N, two computers C,D G 3 are called k- 
equivalent, denoted C ~ D, if C{x) = D(x) for every x G 
{0, 1}* with | a; | > k. We denote the corresponding equivalence 
classes by [C] k and set 

m k :={\C] k C G $}. 

A computer set $ C 3 is called complete if for every Ce f 
and k G N it holds [C]k C If $ C 3 is positive recurrent 
and complete, we set for every [C]k G 



thus, the definition fi$ h ([D] k \$) := Z D e[D] k »c'(D\$) 
makes sense for n G N and [C]k, [D]k G and is indepen- 
dent of the choice of the representative C G [C\ k . Enumerating 
the equivalence classes = {[Ci] fc , [C 2 ) k , [C 3 ] k , ■ ■ ■} in 

arbitrary order, we can define an associated emulation matrix 
k as 



K[C] k \$) := 



ce[c] k 



It is easy to see that for every C, D G 3 it holds 



1,3 



([Cj] k \<t>). 



It is easily checked that if $ is positive recurrent, then 
the Markov process described by the transition matrix £$ ik 
must also be irreducible, aperiodic and positive recurrent, 
and^:= (p([Ci] fc |$), p([C 2 ] k \$), p([C 3 ] k \$), . . .) is the 
unique probability vector solution to the equation /i$,fe-£*,fe = 

Now we can define input transformations: 

Definition A.2 (Input Transformation 
Let a : {0,1}" — > {0,1}™ be a permutation such that there 
is at least one string x G {0,1}™ for which x\ ^ cr(x)i, 
where x% denotes the first bit of x. For every s G {0, 1}*, 
let Z CT (s) be the string that is generated by applying a to the 
last n bits of s (e.g. if n — 1, <r(l) = and s = 1011, then 
I„(s) = 1010). If \s\ < n, then X„(s) := s. For every C G 3, 
the T a -transformed computer T a (C) is defined by 

(l a (C)) (s) := C(l a (s)) for every s G {0, 1}*. 

We call \a\ := n the order of a. Moreover, we use the notation 

:={l a (C) | C7 G $}. 



000 







001 


00 







5 




C ~D<=> 



C 



D 



for every s G {0, 1}" 



Fig. 5. The input transformation C >—* I a (C) for cr(0) = 1, c(l) = 0. 

The action of an input transformation is depicted in Fig- 
ure [5] Changing e.g. the last bit of the input causes a permu- 
tation of the outputs corresponding to neighboring branches. 
As long as $ is complete and closed with respect to that input 
transformation, the emulation structure will not be changed. 
This is a byproduct of the proof of the following theorem: 

Theorem A.3 (Input Symmetry): Let $ C 5 be positive 
recurrent, complete and closed with respect to an input trans- 
formation T a . Then, for every k > \a\ 

»([C] k \$) = »([l a (C)] k \$). 
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Proof. Suppose that [C] k [C ] k , i.e. C(0 ®x) = C (x) 
for every \x\ > k, C £ [C]fc and C £ [Cb]fe. As \a\ < k, 

(T CT (C))(0(g)a;) = C^O <g> s)) = C(0 <&I a (x)) 
= C (l (T (x))=l tT (C )(x), 



[Ta(Ci)]k and vice versa. 



so [Z CT (C7)] fc — ► [2a(Co)]fe. Analogously, from [C] k 
[Ci] fc it follows that [T ff (C)] fc — 
Thus, 

_..(D 

So interchanging every equivalence class of computers with 
its transformed class leaves the emulation matrix invariant. A 
similar argument as in Theorem 14.61 proves the claim. □ 

We are now heading towards an analogue of Equation (01, 
i.e. towards a proof that our algorithmic string probability 
equals the weighted average of output frequency. This needs 
some preparation: 

Definition A.4 (Input Symmetry Group): Let X a be an input 
transformation of order n £ N. A computer C £ S is 
called T a - symmetric if T a (C) = C (which is equivalent to 
[I a (C)] n = [C}„). The input symmetry group of C is defined 
as 

1 - SYM(C) := {l a input transformation | l a (C) = C}. 

Every transformation of order n £ N can also be interpreted 
as a transformation on {0,1}^ for N > n, by setting 

<j{x\®X2®- ■ -®Xn) := {X\®. . . XN-n)®0~{xN-n+l, ■ ■ ■ , Xn) 

whenever Xi £ {0, 1}. With this identification, 1 - SYM(C) 
is a group. 

Proposition A.5 (Input Symmetry and Irreducibility): Let 
$ C S be irreducible. Then I - SYM(C) is the same for 
every C £ $ and can be denoted I - SYM($). 
Proof. Let $ C S be irreducible, and let C £ $ be 2 a - 
symmetric, i.e. C(2 a (s)) = C(s) for every s £ {0,1}*. Let 
D £ $ be an arbitrary computer. Since $ is irreducible, it 
holds C — ► D, i.e. there is a string x £ {0, 1}* with C(x ® 
s) = D(s) for every s £ {0, 1}*. Let |s| > |er|, then 

D(s) = C(x®s) = C(Z CT (x®s)) = C(x®X CT (s)) = £>(X CT (s)) 

and D is also Z^-symmetric. □ 

For most irreducible computer sets like $ = E u , the 
input symmetry group will only consist of the identity, i.e. 
I — SYM($) = {Id}. 

Now we are ready to state the most interesting result of this 
section: 

Theorem A.6 (Equivalence of Definitions): If $ C S is 
positive recurrent, complete and closed with respect to every 
input transformation T a with \a\ < n £ No, then 



/i( S |$) 



where fjjj' (s) is the output frequency as introduced in Defi- 
nition 12.11 

Proof. The case n = is trivial, so let n > 1. It is 
convenient to introduce another equivalence relation on the 



v(U\<5>)$\s) for every s £ {0, 1}*, 



computer classes. We define the corresponding equivalence 
classes ("transformation classes") as 

{V} k := {[X] k £ [$] fc | 3I a : |<r| < k, [l«{V)} k = [X] k } . 

Thus, two computer classes [X] k and \Y] k are elements of the 
same transformation class if one is an input transformation 
(of order less than k) of the other. Again, we set {$} k '■= 
{{X} k | X £ *}. 

For every X £ [X] n , the probability /u^ (s|3?) is the same 
and can be denoted /ifej (s\$). According to Proposition 14. 31 
we have 

M«l*)= E E M(M»l*)$j n M*). 

{X}„6{*}„ [y]»e{x}„ 

Due to Theorem lA.3l the probability /i([Y]„|$) is the same for 
every [Y] n £ {Y}„. Let [X] n be an arbitrary representative 
of {X} n , then 

M ({X}„|$):= KlY] n m=#{X} n -tx([X] n \<S>). 

[Y] n e{x} n 



The two equations yield 

= E 



ffl [r] n e{X} n 



Let S 2 ™ be the set of all permutations on {0, 1}™. Two 
permutations cri,^ £ S2« are called ^-equivalent if there 
exists a a £ I— SYM($) such that <7i = eroo- 2 (recall that $ is 
irreducible). This is the case if and only of X ai (C) = X CT2 (C) 
for one and thus every computer C £ $. The set of all <&- 
equivalence classes will be denoted S„(<I>). Every computer 
class [Y] n £ {X}„ is generated from [X] n by some input 
transformation. If X is an arbitrary representative of [X] n , we 
thus have 

( \ ' ^({ X }n\^) \- (n) , 

{x}„e{*}„ ^ l ;n Wes n (*) 

where <r £ [cr] is an arbitrary representative. For every equiv- 
alence class [cr], it holds true #[<r] = #(S 2 « nX- SYM($)), 
thus 

//(<.!*)- V iW).^ ! 

- 2-~i #{x} n #(i-SYM(*)ns 2 n) 

W„e{*}„ 

By definition of the set S„($), 

#{X}„ ■ #(Z-SYM($) nS 2 ,) = #S 2 „ - (2™)!. 
Using that #{X}„ = #S n ($), we obtain 

M({^}n|$) .,(„) 



E 

{X}„e{*}„ 



E 

{X}„G{*}„ 



(2™)! 



E 



(2")! s 

E ''i.mw.ste-it*)^)' 

xe{oa}- 
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As \x\ =n > \a\ it holds = = X{a(x)). 

The substitution y := a{x) yields 

f ict>\ sr a 

= 2^ ^7v ^ 

{X}„e{*}„ 1 ^ ye{o,i}" 

Up to normalization, the rightmost sum is the average of all 
permutations of the probability vector /lix-M*)' tri us 

T^T E /^- lw (^ 1 (y)) = 2-". 

v y ' crGS 2 „ 

Recall that X was an arbitrary representative of an arbitrary 
representative of {X} n . The last two equations yield 

E E f([X]»|*)4 n) («) 

{x}„e{<i>}„ [x]„e{*}„ 

= E E E M*l*)/4 n) M 

{x}„£{*}„ [x]„e{x}„ X6[X]„ 

Note that if X and F are representatives of representatives 
of an arbitrary transformation class {X}„, then (jk?\s) = 
$\s). ' " □ 

This theorem is the promised analogue of Equation (01: 
it shows that the string probability that we have defined in 
Definition 14.11 is the weighted average of output frequency 
as defined in Definition 12.11 For a discussion why this is 
interesting and surprising, see the first few paragraphs of this 
appendix. 
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